import pandas as pd
import numpy as np
import matplotlib as mpl
import plotly as plotly
plotly.__version__
'5.9.0'
This project will explore how the residential building permit value varies in different communities within Calgary. I am focusing on three types of residential building permits: commercial/multi-family, single construction, and residential improvement, to see if the value of these permits will reflect the housing trend (large apartment complex versus single-family home versus home improvement) in each community and how the trends change before, during, and after COVID-19 outbreak from 2018-2023. Since there are over 300 small communities around Calgary, we will divide those communities into eight zones (center, north, south, east, west, northwest, northeast, and southeast).
There are two datasets that I used from the Open Calgary website. My main dataset is Building Permits, and my secondary dataset is Community Crime and Disorder Statistics(2012-2019), which I use to create the community-to-sector map dictionary. I filter the Building Permits dataset into three datasets based on three types of residential building permits: New-built Commercial/Multi-Family, New-built Single-Family, and Residential improvement.
Data wrangling tasks are dropping unused columns, changing some column types, creating community-to-sector maps, adding sector columns to the three residential permit datasets, and removing rows with missing information.
Data visualization tasks are grouping the data by year and sector, calculating the yearly permit value, preplot the data for verification, and creating tables for plotting stack bar charts in Plotly.
I use the built-in query service on Open Data Calgary to get three datasets for each residential permit type: New-built Commercial/Multi-Family, New-built Single-Family, and Residential improvement. I also downloaded Community Crime and Disorder Statistics(2012-2019) without query filtering.
Query description:
IssuedDate,PermitClassMapped, PermitType, WorkClass, HousingUnitsBuilding_Permits6625L5Multi.csv, Building_Permits131191L5Single.csv, Building_Permits179416L3Improve.csv# Load Commerical/Multi-Family permit, select columns, convert float Ward Boundaries column to string Ward column.
source_df = pd.read_csv('Building_Permits6625L5Multi.csv')
Multi_df = source_df[['IssuedDate', 'EstProjectCost', 'HousingUnits','CommunityName','Ward Boundaries']]
Ward = Multi_df['Ward Boundaries'].astype(str)
pd.DataFrame(Ward)
Ward = Ward.rename("Ward")
Multi_df = pd.concat([Multi_df,Ward],axis =1)
Multi_df = Multi_df.drop("Ward Boundaries",axis='columns')
display(Multi_df.head())
# repeat above process with Single Familty permit. I don't need HousingUnits because this count as 1 unit.
sources_df = pd.read_csv('Building_Permits131191L5Single.csv')
Single_df = sources_df[['IssuedDate', 'EstProjectCost','CommunityName','Ward Boundaries']]
Wards = Single_df['Ward Boundaries'].astype(str)
pd.DataFrame(Wards)
Wards = Wards.rename("Ward")
Single_df = pd.concat([Single_df,Wards],axis =1)
Single_df = Single_df.drop("Ward Boundaries",axis='columns')
display(Single_df.head())
# repeat above process with Residential Improvement permit. I don't need HousingUnits because this count as 0 unit.
sourcei_df = pd.read_csv('Building_Permits179416L3Improve.csv')
Improv_df = sourcei_df[['IssuedDate', 'EstProjectCost','CommunityName','Ward Boundaries']]
Wardi = Improv_df['Ward Boundaries'].astype(str)
pd.DataFrame(Wardi)
Wardi = Wardi.rename("Ward")
Improv_df = pd.concat([Improv_df,Wardi],axis =1)
Improv_df = Improv_df.drop("Ward Boundaries",axis='columns')
display(Improv_df.head())
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | |
|---|---|---|---|---|---|
| 0 | 10/2/23 | 580,925 | 2 | SAGE HILL | nan |
| 1 | 10/2/23 | 1,409,387 | 4 | SAGE HILL | nan |
| 2 | 10/2/23 | 1,366,160 | 2 | PARKHILL | nan |
| 3 | 10/2/23 | 1,926,194 | 8 | MAHOGANY | nan |
| 4 | 10/2/23 | 969,508 | 4 | MAHOGANY | nan |
| IssuedDate | EstProjectCost | CommunityName | Ward | |
|---|---|---|---|---|
| 0 | 2022/11/28 | 373,884 | CAMBRIAN HEIGHTS | nan |
| 1 | 2021/11/10 | NaN | WEST HILLHURST | 7.0 |
| 2 | 2022/11/09 | 375,787 | BELMONT | nan |
| 3 | 2022/05/13 | 315,574 | AMBLETON | 3.0 |
| 4 | 2022/11/30 | 259,719 | MAHOGANY | nan |
| IssuedDate | EstProjectCost | CommunityName | Ward | |
|---|---|---|---|---|
| 0 | 10/2/23 | 3,942 | CARRINGTON | nan |
| 1 | 10/2/23 | 5,000 | MAYLAND HEIGHTS | nan |
| 2 | 10/2/23 | 55,549 | CRANSTON | nan |
| 3 | 10/2/23 | 3,000 | GLENDALE | nan |
| 4 | 10/2/23 | 46,677 | AMBLETON | nan |
Now, I check if the EstProjectCost column of each dataset has NAN or 0. If so, I will remove them. Only single-family and residential improvement permit dataset has NAN in their EstProjectCost column. Then, I checked the CommunityName column of each dataset and found no NAN, so I am good to go.
Multi_df['EstProjectCost'].isnull().any()
Single_df['EstProjectCost'].isnull().any()
Improv_df['EstProjectCost'].isnull().any()
Improv_df = Improv_df.dropna(subset=['EstProjectCost'])
Single_df = Single_df.dropna(subset=['EstProjectCost'])
Multi_df['CommunityName'].isnull().any()
Single_df['CommunityName'].isnull().any()
Improv_df['CommunityName'].isnull().any()
Single_df['EstProjectCost'].isnull().any()
Improv_df['EstProjectCost'].isnull().any()
False
Now I have to map community name to eight sectors, which is a bit tricky than mapping ward boundaries to eight sectors, because there are only 14 wards but over 300 communities. But there are a lot of missing data in the ward coulmn and none in community name I have no choice. Luckily, I can extract the communities and sectors map from Community Crime and Disorder Statistics 2012-2019 dataset which contain most of the communites in my Building Permit dataset. I filter unique community name along with Sector from the dataset, check for missing values, and change the Community Name columns to match the column name in my main dataset so I can merge them.
#Update Community to Sectors Map
Crime_df = pd.read_csv('Community_Crime_and_Disorder_Statistics__2012-2019_.csv')
CrimeUnique_df = Crime_df.drop_duplicates(subset = "Community Name")
Commumap_df = CrimeUnique_df[['Community Name','Sector']]
Commumap_df = Commumap_df.rename(columns={"Community Name": "CommunityName"})
Commumap_df.isnull().any()
display(Commumap_df.head())
# Add sector column by merging with community map using community name as merge key
MultiWCommu_df = Multi_df.merge(Commumap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
display(MultiWCommu_df)
| CommunityName | Sector | |
|---|---|---|
| 0 | THORNCLIFFE | NORTH |
| 1 | WOODBINE | SOUTH |
| 2 | WILLOW PARK | SOUTH |
| 4 | LINCOLN PARK | WEST |
| 5 | RAMSAY | CENTRE |
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|---|
| 3809 | 7/10/13 | 849,474 | 4 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 3451 | 4/11/14 | 16,175,000 | 100 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 2722 | 9/7/16 | 2,223,322 | 23 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 1556 | 2/21/20 | 875,000 | 4 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 1658 | 11/5/19 | 900,000 | 4 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| ... | ... | ... | ... | ... | ... | ... |
| 127 | 6/26/23 | 834,684 | 4 | YORKVILLE | nan | NaN |
| 1675 | 10/25/19 | 1,135,741 | 4 | YORKVILLE | 6.0 | NaN |
| 129 | 6/26/23 | 827,467 | 4 | YORKVILLE | nan | NaN |
| 1814 | 6/19/19 | 1,114,523 | 4 | YORKVILLE | 6.0 | NaN |
| 1848 | 5/16/19 | 1,135,741 | 4 | YORKVILLE | 6.0 | NaN |
6626 rows × 6 columns
Seem like the Community to sector map still missing some community Name, I have to take a look and manaually update it. Looks like I am missing 10 community. So I search for missing community update it. So now I have created a new cvs file, Commap.cvs and use it to add sector column to all three dataset.
# Update Community map manually and reload it
Commu2SectorMap_df = pd.read_csv('Commumap.csv')
# Merge again and test if there still NaN in sector column
MultiWCommu_df = Multi_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
narow = MultiWCommu_df.loc[MultiWCommu_df.Sector.isnull()]
display(narow)
SingleWCommu_df = Single_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
narows = SingleWCommu_df.loc[SingleWCommu_df.Sector.isnull()]
display(narows)
ImprovWCommu_df = Improv_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
narowi = ImprovWCommu_df.loc[ImprovWCommu_df.Sector.isnull()]
display(narowi)
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|---|
| 1359 | 11/6/20 | 770,785 | 4 | Undefined | 3.0 | NaN |
| 1481 | 6/23/20 | 1,992,860 | 10 | Undefined | 3.0 | NaN |
| 953 | 12/3/21 | 1,064,277 | 4 | Undefined | 4.0 | NaN |
| 1345 | 12/2/20 | 9,602,466 | 64 | Undefined | 3.0 | NaN |
| 1984 | 12/18/18 | 477,806 | 6 | Undefined | 6.0 | NaN |
| 1264 | 3/19/21 | 1,542,523 | 10 | Undefined | 6.0 | NaN |
| 2491 | 7/4/17 | 20,261,732 | 162 | Undefined | 4.0 | NaN |
| 1339 | 12/16/20 | 1,101,602 | 6 | Undefined | 3.0 | NaN |
| 939 | 12/13/21 | 1,546,871 | 6 | Undefined | 4.0 | NaN |
| 1268 | 3/16/21 | 907,963 | 5 | Undefined | 11.0 | NaN |
| 1272 | 3/9/21 | 809,223 | 5 | Undefined | 11.0 | NaN |
| 1986 | 12/18/18 | 567,662 | 8 | Undefined | 6.0 | NaN |
| 1301 | 2/9/21 | 916,680 | 5 | Undefined | 11.0 | NaN |
| 1140 | 6/15/21 | 13,133,900 | 60 | Undefined | 5.0 | NaN |
| 1987 | 12/18/18 | 325,791 | 4 | Undefined | 6.0 | NaN |
| 928 | 12/24/21 | 1,044,115 | 4 | Undefined | 4.0 | NaN |
| 1170 | 5/31/21 | 10,692,300 | 48 | Undefined | 5.0 | NaN |
| 2301 | 3/19/18 | 13,107,065 | 83 | Undefined | 4.0 | NaN |
| 1160 | 6/4/21 | 762,473 | 4 | Undefined | 6.0 | NaN |
| 1371 | 10/27/20 | 1,124,995 | 6 | Undefined | 3.0 | NaN |
| 843 | 3/17/22 | 1,515,280 | 9 | Undefined | 4.0 | NaN |
| 757 | 5/4/22 | 2,833,228 | 12 | Undefined | 4.0 | NaN |
| 1370 | 10/30/20 | 770,785 | 4 | Undefined | 3.0 | NaN |
| 1206 | 5/10/21 | 1,479,205 | 8 | Undefined | 3.0 | NaN |
| 937 | 12/14/21 | 1,546,871 | 6 | Undefined | 4.0 | NaN |
| 1207 | 5/7/21 | 1,479,205 | 8 | Undefined | 3.0 | NaN |
| 838 | 3/25/22 | 1,999,630 | 15 | Undefined | 4.0 | NaN |
| 1443 | 8/12/20 | 1,134,102 | 7 | Undefined | 6.0 | NaN |
| 950 | 12/7/21 | 1,827,288 | 7 | Undefined | 4.0 | NaN |
| 1447 | 8/10/20 | 1,025,547 | 6 | Undefined | 6.0 | NaN |
| 904 | 1/26/22 | 11,742,885 | 47 | Undefined | 4.0 | NaN |
| 905 | 1/26/22 | 6,118,078 | 47 | Undefined | 4.0 | NaN |
| 1365 | 11/4/20 | 947,765 | 5 | Undefined | 3.0 | NaN |
| 800 | 4/22/22 | 3,244,257 | 12 | Undefined | 4.0 | NaN |
| 2007 | 12/5/18 | 1,500,000 | 4 | Undefined | 7.0 | NaN |
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 4854 | 2023/07/18 | 326,677 | Undefined | nan | NaN |
| 3598 | 2023/06/20 | 427,996 | Undefined | nan | NaN |
| 2421 | 2023/05/31 | 262,511 | Undefined | nan | NaN |
| 529 | 2023/07/26 | 339,427 | Undefined | nan | NaN |
| 3021 | 2023/06/09 | 262,511 | Undefined | nan | NaN |
| 1953 | 2023/06/12 | 274,500 | Undefined | nan | NaN |
| 2069 | 2023/06/01 | 275,962 | Undefined | nan | NaN |
| 2775 | 2023/07/27 | 350,085 | Undefined | nan | NaN |
| 1654 | 2023/03/23 | 202,247 | Undefined | nan | NaN |
| 4950 | 2023/07/18 | 312,069 | Undefined | nan | NaN |
| 2458 | 2023/06/21 | 274,500 | Undefined | nan | NaN |
| 1223 | 2023/03/03 | 166,041 | Undefined | nan | NaN |
| 348 | 2023/07/26 | 339,427 | Undefined | nan | NaN |
| 297 | 2023/07/20 | 326,506 | Undefined | nan | NaN |
| 1850 | 2023/07/18 | 329,513 | Undefined | nan | NaN |
| 2655 | 2023/03/23 | 355,375 | Undefined | nan | NaN |
| 5141 | 2023/07/18 | 312,473 | Undefined | nan | NaN |
| 4934 | 2023/06/10 | 422,102 | Undefined | nan | NaN |
| 1807 | 2023/03/23 | 165,854 | Undefined | nan | NaN |
| 3054 | 2023/03/17 | 165,854 | Undefined | nan | NaN |
| 1864 | 2023/07/18 | 306,008 | Undefined | nan | NaN |
| 300 | 2023/05/31 | 432,076 | Undefined | nan | NaN |
| 3328 | 2023/06/02 | 272,168 | Undefined | nan | NaN |
| 4917 | 2023/06/20 | 272,168 | Undefined | nan | NaN |
| 2419 | 2023/05/31 | 262,511 | Undefined | nan | NaN |
| 2413 | 2023/09/28 | 327,038 | Undefined | nan | NaN |
| 5321 | 2023/06/29 | 271,790 | Undefined | nan | NaN |
| 3633 | 2023/07/17 | 240,560 | Undefined | nan | NaN |
| 3130 | 2023/03/17 | 202,419 | Undefined | nan | NaN |
| 2968 | 2023/07/11 | 356,430 | Undefined | nan | NaN |
| 2063 | 2023/06/15 | 275,530 | Undefined | nan | NaN |
| 5262 | 2023/06/29 | 311,721 | Undefined | nan | NaN |
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 3078 | 6/12/23 | 40,505 | Undefined | nan | NaN |
| 2730 | 6/21/23 | 44,555 | Undefined | nan | NaN |
The Undefined and NaN community name causes the NaN in the column sector. Only the Muti-Commercial dataset has ward information for those Undefined communities. Since the Single-Family dataset has 32 entries with no community name and ward information, and the Residential improvement dataset has 14, I decided to remove those entries. However, since the Muti-Commercial dataset has ward information, I decided to replace Undefined CommunityName with their ward number and add the ward number into my Commu2SectorMap data frame so I can merge them again to get rid of NaN in the sector column.
I also convert the AppliedDate column to Pandas date-time type and the EstProjectCost column to integer.
#replacing the Undefined community name with Ward.
Multi_df.CommunityName =np.where(Multi_df.CommunityName == 'Undefined',
Multi_df['Ward'],Multi_df.CommunityName)
# Create Ward dictinary and add it into the Community to Sector Map
WardDict = {'CommunityName':['1.0','2.0','3.0','4.0','5.0','6.0','7.0','8.0','9.0','10.0','11.0','12.0','13.0','14.0'],
'Sector':['NORTHWEST','NORTH','NORTH','NORTH','NORTHEAST','WEST','CENTRE','CENTRE','EAST','NORTHEAST',
'SOUTH','SOUTHEAST','SOUTH','SOUTH']}
WardDict_df = pd.DataFrame(WardDict)
Commu2SectorMap_df = pd.concat([Commu2SectorMap_df, WardDict_df], ignore_index = True)
# Merge again to get rid of NaN in Sector column of MutiWCommu. Don't need to do this with other 2 since I have to drop it
MultiWCommu_df = Multi_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
# Drop the row with Undefined community and missing ward information
SingleWCommu_df = SingleWCommu_df.dropna(subset=['Sector'])
# Drop the row with Undefined community and missing ward information
ImprovWCommu_df = ImprovWCommu_df.dropna(subset=['Sector'])
# Change issue date to date-time type
MultiWCommu_df['IssuedDate'] = pd.to_datetime(MultiWCommu_df['IssuedDate'])
SingleWCommu_df['IssuedDate'] = pd.to_datetime(SingleWCommu_df['IssuedDate'])
ImprovWCommu_df['IssuedDate'] = pd.to_datetime(ImprovWCommu_df['IssuedDate'])
# Change the EstProjectCost column from string to int. I need to remove the comma first so I can convert to int
MultiWCommu_df['EstProjectCost'] = MultiWCommu_df['EstProjectCost'].str.split(',').str.join('').astype(int)
SingleWCommu_df['EstProjectCost'] = SingleWCommu_df['EstProjectCost'].str.split(',').str.join('').astype(int)
ImprovWCommu_df['EstProjectCost'] = ImprovWCommu_df['EstProjectCost'].str.split(',').str.join('').astype(int)
display(MultiWCommu_df)
display(SingleWCommu_df)
display(ImprovWCommu_df)
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|---|
| 1268 | 2021-03-16 | 907963 | 5 | 11.0 | 11.0 | SOUTH |
| 1272 | 2021-03-09 | 809223 | 5 | 11.0 | 11.0 | SOUTH |
| 1301 | 2021-02-09 | 916680 | 5 | 11.0 | 11.0 | SOUTH |
| 1206 | 2021-05-10 | 1479205 | 8 | 3.0 | 3.0 | NORTH |
| 1371 | 2020-10-27 | 1124995 | 6 | 3.0 | 3.0 | NORTH |
| ... | ... | ... | ... | ... | ... | ... |
| 1863 | 2019-05-02 | 1158342 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1878 | 2019-04-11 | 681195 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1879 | 2019-04-11 | 1125554 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1848 | 2019-05-16 | 1135741 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1332 | 2020-12-30 | 1461013 | 8 | YORKVILLE | 6.0 | SOUTH |
6626 rows × 6 columns
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 80714 | 2011-02-25 | 151527 | ABBEYDALE | 12.0 | NORTHEAST |
| 80037 | 2011-07-14 | 441365 | ACADIA | 10.0 | SOUTH |
| 91126 | 2013-05-30 | 362893 | ACADIA | 10.0 | SOUTH |
| 106298 | 2016-07-26 | 323962 | ACADIA | 10.0 | SOUTH |
| 103754 | 2015-10-23 | 301952 | ACADIA | 10.0 | SOUTH |
| ... | ... | ... | ... | ... | ... |
| 1463 | 2023-09-21 | 353680 | YORKVILLE | nan | SOUTH |
| 121788 | 2020-11-18 | 193632 | YORKVILLE | 6.0 | SOUTH |
| 5336 | 2023-08-02 | 432043 | YORKVILLE | nan | SOUTH |
| 123918 | 2021-02-02 | 194845 | YORKVILLE | 6.0 | SOUTH |
| 125150 | 2021-06-28 | 432043 | YORKVILLE | 6.0 | SOUTH |
130736 rows × 5 columns
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 91844 | 2012-10-18 | 15000 | ABBEYDALE | 12.0 | NORTHEAST |
| 2055 | 2023-07-11 | 2000 | ABBEYDALE | nan | NORTHEAST |
| 172338 | 2000-08-30 | 9570 | ABBEYDALE | 12.0 | NORTHEAST |
| 148432 | 2005-05-17 | 9870 | ABBEYDALE | 12.0 | NORTHEAST |
| 72803 | 2015-02-26 | 122500 | ABBEYDALE | 12.0 | NORTHEAST |
| ... | ... | ... | ... | ... | ... |
| 5811 | 2023-02-21 | 28298 | YORKVILLE | nan | SOUTH |
| 14502 | 2022-02-24 | 34798 | YORKVILLE | 6.0 | SOUTH |
| 34746 | 2020-01-28 | 23900 | YORKVILLE | 6.0 | SOUTH |
| 26010 | 2020-11-18 | 12858 | YORKVILLE | 6.0 | SOUTH |
| 24173 | 2021-02-12 | 13755 | YORKVILLE | 6.0 | SOUTH |
175126 rows × 5 columns
Finally, my data-wrangling tasks are a success! Now, I will start to prepare my data for visualization.
# Group by Year so I can get yearly EstProjectCost for each sector
MultiYear = MultiWCommu_df.groupby([MultiWCommu_df.IssuedDate.dt.year,'Sector'],as_index=True)['EstProjectCost'].sum()
display(MultiYear)
SingleYear = SingleWCommu_df.groupby([SingleWCommu_df.IssuedDate.dt.year,'Sector'],as_index=True)['EstProjectCost'].sum()
display(SingleYear)
ImprovYear = ImprovWCommu_df.groupby([ImprovWCommu_df.IssuedDate.dt.year,'Sector'],as_index=True)['EstProjectCost'].sum()
display(ImprovYear)
IssuedDate Sector
2000 CENTRE 114337522
EAST 16098198
NORTH 7616974
NORTHEAST 7357570
NORTHWEST 25252600
...
2023 NORTHEAST 111710577
NORTHWEST 117496901
SOUTH 165090967
SOUTHEAST 137599026
WEST 318603071
Name: EstProjectCost, Length: 189, dtype: int64
IssuedDate Sector
2000 CENTRE 23228708
EAST 1557527
NORTH 91507117
NORTHEAST 84846542
NORTHWEST 194974840
...
2023 NORTHEAST 182845981
NORTHWEST 94970150
SOUTH 233156378
SOUTHEAST 184531020
WEST 56401500
Name: EstProjectCost, Length: 192, dtype: int64
IssuedDate Sector
2000 CENTRE 12796193
EAST 1970782
NORTH 5091377
NORTHEAST 4867353
NORTHWEST 7278539
...
2023 NORTHEAST 54223545
NORTHWEST 35124127
SOUTH 57943309
SOUTHEAST 36279922
WEST 26458502
Name: EstProjectCost, Length: 192, dtype: int64
I will first plot them seperately to see the overall scale.
import matplotlib.pyplot as plt
plt.suptitle('Muti-Commercial Residential Permit value by sector')
plt.subplot(2,3,1)
MultiYear[2023].plot.bar()
plt.title('2023')
plt.xticks(visible=False)
plt.xlabel('')
plt.ylabel('Permit value $')
plt.subplot(2,3,2)
MultiYear[2022].plot.bar()
plt.title('2022')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,3)
MultiYear[2021].plot.bar()
plt.title('2021')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,4)
MultiYear[2020].plot.bar()
plt.title('2020')
plt.ylabel('Permit value $')
plt.subplot(2,3,5)
MultiYear[2019].plot.bar()
plt.title('2019')
plt.subplot(2,3,6)
MultiYear[2018].plot.bar()
plt.title('2018')
plt.tight_layout()
plt.suptitle('Single-Family Residential Permit value by sector')
plt.subplot(2,3,1)
SingleYear[2023].plot.bar()
plt.title('2023')
plt.xticks(visible=False)
plt.xlabel('')
plt.ylabel('Permit value $')
plt.subplot(2,3,2)
SingleYear[2022].plot.bar()
plt.title('2022')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,3)
SingleYear[2021].plot.bar()
plt.title('2021')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,4)
SingleYear[2020].plot.bar()
plt.title('2020')
plt.ylabel('Permit value $')
plt.subplot(2,3,5)
SingleYear[2019].plot.bar()
plt.title('2019')
plt.subplot(2,3,6)
SingleYear[2018].plot.bar()
plt.title('2018')
plt.tight_layout()
plt.suptitle('Residential Improvement Permit value by sector')
plt.subplot(2,3,1)
ImprovYear[2023].plot.bar()
plt.title('2023')
plt.xticks(visible=False)
plt.xlabel('')
plt.ylabel('Permit value $')
plt.subplot(2,3,2)
ImprovYear[2022].plot.bar()
plt.title('2022')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,3)
ImprovYear[2021].plot.bar()
plt.title('2021')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,4)
ImprovYear[2020].plot.bar()
plt.title('2020')
plt.ylabel('Permit value $')
plt.subplot(2,3,5)
ImprovYear[2019].plot.bar()
plt.title('2019')
plt.subplot(2,3,6)
ImprovYear[2018].plot.bar()
plt.title('2018')
plt.tight_layout()
Now, I will create tables for each year with three columns(Sector, Residential Permit Type, Total Permit Value) and 24 rows and plot the permit value of three permit types together in one plot for each year.
I will also create line plots to show each type of permit value over time for the 8 eight sectors. To do this, I will create tables for each permit type with three columns( Year, Sector, Total Permit Value) and 48 rows.
S2018 = SingleYear[2018][0:8]
S2019 = SingleYear[2019][0:8]
S2020 = SingleYear[2020][0:8]
S2021 = SingleYear[2021][0:8]
S2022 = SingleYear[2022][0:8]
S2023 = SingleYear[2023][0:8]
Svec = ['Single-Family']*8
M2018 = MultiYear[2018][0:8]
M2019 = MultiYear[2019][0:8]
M2020 = MultiYear[2020][0:8]
M2021 = MultiYear[2021][0:8]
M2022 = MultiYear[2022][0:8]
M2023 = MultiYear[2023][0:8]
Mvec = ['Muti-Commercial']*8
I2018 = ImprovYear[2018][0:8]
I2019 = ImprovYear[2019][0:8]
I2020 = ImprovYear[2020][0:8]
I2021 = ImprovYear[2021][0:8]
I2022 = ImprovYear[2022][0:8]
I2023 = ImprovYear[2023][0:8]
Ivec = ['Residential Improvement']*8
ALLsector_vec = ['CENTRE','EAST','NORTH','NORTHEAST','NORTHWEST','SOUTH','SOUTHEAST','WEST']*3
ALLtype_vec = Svec + Mvec + Ivec
ALLvalue_2018 = pd.concat([S2018,M2018,I2018], axis = 0)
ALLvalue_2018list = ALLvalue_2018.tolist()
Table2018 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2018list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2019 = pd.concat([S2019,M2019,I2019], axis = 0)
ALLvalue_2019list = ALLvalue_2019.tolist()
Table2019 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2019list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2020 = pd.concat([S2020,M2020,I2020], axis = 0)
ALLvalue_2020list = ALLvalue_2020.tolist()
Table2020 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2020list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2021 = pd.concat([S2021,M2021,I2021], axis = 0)
ALLvalue_2021list = ALLvalue_2021.tolist()
Table2021 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2021list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2022 = pd.concat([S2022,M2022,I2022], axis = 0)
ALLvalue_2022list = ALLvalue_2022.tolist()
Table2022 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2022list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2023 = pd.concat([S2023,M2023,I2023], axis = 0)
ALLvalue_2023list = ALLvalue_2023.tolist()
Table2023 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2023list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
import plotly.express as px
from plotly.subplots import make_subplots
# plot the long (tidy) dataframe
fig18 = px.bar(Table2018, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2018 Permit Value", barmode='stack')
fig18.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig18.show()
/Users/annesunrawee/opt/anaconda3/lib/python3.9/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.25.0 is required for this version of SciPy (detected version 1.26.1
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
fig19 = px.bar(Table2019, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2019 Permit Value", barmode='stack')
fig19.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig19.show()
fig20 = px.bar(Table2020, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2020 Permit Value", barmode='stack')
fig20.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig20.show()
fig21 = px.bar(Table2021, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2021 Permit Value", barmode='stack')
fig21.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig21.show()
fig22 = px.bar(Table2022, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2022 Permit Value", barmode='stack')
fig22.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig22.show()
fig23 = px.bar(Table2023, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2023 Permit Value", barmode='stack')
fig23.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig23.show()
ALLsectoryear_vec = ['CENTRE','EAST','NORTH','NORTHEAST','NORTHWEST','SOUTH','SOUTHEAST','WEST']*6
Allyear_vec = ['2018']*8 + ['2019']*8 + ['2020']*8 + ['2021']*8 + ['2022']*8 + ['2023']*8
ALLvalue_S = pd.concat([S2018,S2019,S2020,S2021,S2022,S2023], axis = 0)
ALLvalue_Slist = ALLvalue_S.tolist()
TableS = pd.DataFrame(list(zip(Allyear_vec,ALLsectoryear_vec,ALLvalue_Slist)),
columns =['Year','Sector','Total Permit Value'])
ALLvalue_M = pd.concat([M2018,M2019,M2020,M2021,M2022,M2023], axis = 0)
ALLvalue_Mlist = ALLvalue_M.tolist()
TableM = pd.DataFrame(list(zip(Allyear_vec,ALLsectoryear_vec,ALLvalue_Mlist)),
columns =['Year','Sector','Total Permit Value'])
ALLvalue_I = pd.concat([I2018,I2019,I2020,I2021,I2022,I2023], axis = 0)
ALLvalue_Ilist = ALLvalue_I.tolist()
TableI = pd.DataFrame(list(zip(Allyear_vec,ALLsectoryear_vec,ALLvalue_Ilist)),
columns =['Year','Sector','Total Permit Value'])
figS = px.line(TableS, x="Year", y="Total Permit Value", color="Sector",text ='Total Permit Value', title="Single-Family Permit Value")
figS.update_layout(xaxis_title='Year', yaxis=dict(tickformat="$",))
figS.show()
figM = px.line(TableM, x="Year", y="Total Permit Value", color="Sector",text ='Total Permit Value', title="Multi-Commercial Permit Value")
figM.update_layout(xaxis_title='Year', yaxis=dict(tickformat="$",))
figM.show()
figI = px.line(TableI, x="Year", y="Total Permit Value", color="Sector",text ='Total Permit Value', title="Residential Improvement Permit Value")
figI.update_layout(xaxis_title='Year', yaxis=dict(tickformat="$",))
figI.show()
I will summarize the findings to my guiding question: What is the relationship between residential building permit value and different communities?
The EAST sector has the smallest permit value of all types because it is the smallest sector. However, the WEST sector, the second smallest sector, has a much higher permit value than the EAST sector.
The CENTRE sector has much higher commercial permit value due to the concentration of large-scale commercial apartments and condo complexes surrounding downtown and the SAIT campus.
The NORTHWEST sector shares the same trend with the CENTRE center, with the commercial permit value higher than the single-family permit value, partly due to the large-scale apartment and condo complex surrounding the University of Calgary campus.
The NORTH and NORTHEAST sectors have led in single-family permit value since 2018, with many new communities expanding to the north, such as Ambleton, Livingston, and Glacier Ridge. SOUTH and SOUTHEAST sectors start to catch up with them in 2021, with new neighborhoods expanding to the south, such as Seton and Legacy.
Permit value of the new-built single-family home and new-built commercial residential buildings dominate the Calgary real estate market, with residential improvement permit value being an order of magnitude less.
Residential Improvement permit value increased in 2020 for all sectors except the CENTRE sector, where the permit value slightly decreased. The SOUTH sector dominates in the residential improvement permit value throughout and peaks in 2021 and 2022.
During the COVID-19 outbreak, 2020 saw a drop in single-family permit value in 4 sectors(NORTH, SOUTHEAST, CENTRE, WEST) while slightly increasing in the other four sectors. Commercial residential building permit value dropped in 2020 in 6 sectors, with a sharp drop in the CENTRE and NORTH sectors, while the other two sectors (SOUTHEAST and WEST) increased.
Both single-family and commercial residential building permit values increased in 2021 across all sectors, except for a slight drop in the commercial residential building permit value in the SOUTH sector.
Single-family permit value increased sharply in the NORTHEAST, NORTH, SOUTH, and SOUTHEAST in 2021. Commercial residential building permit values increase sharply in the CENTRE and NORTH sectors.
Single-family permit value drop in 2023 in all sectors. Commercial residential building permit values decreased in most sectors except increasing in the WEST and EAST sectors.